This paper addresses the problem of Approximate Nearest Neighbor (ANN) searchin pattern recognition where feature vectors in a database are encoded ascompact codes in order to speed-up the similarity search in large-scaledatabases. Considering the ANN problem from an information-theoreticperspective, we interpret it as an encoding, which maps the original featurevectors to a less entropic sparse representation while requiring them to be asinformative as possible. We then define the coding gain for ANN search usinginformation-theoretic measures. We next show that the classical approach tothis problem, which consists of binarization of the projected vectors issub-optimal. Instead, a properly designed ternary encoding achieves highercoding gains and lower complexity.
展开▼